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We present the biologically inspired Ripple Pond Network (RPN), a simply connected 
spiking neural network which performs a transformation converting two dimensional 
images to one dimensional temporal patterns (TP) suitable for recognition by temporal 
coding learning and memory networks. The RPN has been developed as a hardware 
solution linking previously implemented neuromorphic vision and memory structures 
such as frameless vision sensors and neuromorphic temporal coding spiking neural 
networks. Working together such systems are potentially capable of delivering end-to-end 
high-speed, low-power and low-resolution recognition for mobile and autonomous 
applications where slow, highly sophisticated and power hungry signal processing 
solutions are ineffective. Key aspects in the proposed approach include utilizing the spatial 
properties of physically embedded neural networks and propagating waves of activity 
therein for information processing, using dimensional collapse of imagery information into 
amenable TP and the use of asynchronous frames for information binding. 
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INTRODUCTION 

How did the earliest predators achieve the ability to recognize 
their prey regardless of their relative position and orientation? 
What minimal neural networks could possibly achieve the task 
of real-time view invariant recognition, which is so ubiquitous 
in animals, even those with miniscule nervous systems (Van 
der Velden et al., 2008; Avargues-Weber et al., 2010; Tricarico 
et al, 2011; Gherardi et al., 2012; Neri, 2012), as evidenced by 
mimetic species (Figure 1). yet so difficult for artificial systems? 
(Pinto et al., 2008). If realized such a minimal solution would be 
ideal for today's autonomous imaging sensor networks, wireless 
phones, and other embedded vision systems which are coming up 
against the same constraints of limited size, power and real-time 
operation as the earliest sighted predators. 

In this paper we present an element of such a simple net- 
work. We call this subsystem the Ripple Pond Network (RPN). 
The name Ripple Pond alludes to the network's fluid-like rip- 
pling operation. The RPN is a feed-forward time-delay spiking 
neural network with static unidirectional connectivity. The net- 
work is responsible for the transform of centered input images 
received from an upstream salience detector into spatio-temporal 
spike patterns that can be learnt and recognized by a downstream 
temporal coding memory network. 

THE VIEW INVARIANCE PROBLEM 

In the 2005 book, "23 Problems in Systems Neuroscience" (van 
Hemmen and Sejnowski, 2005), the 16th problem in systems neu- 
roscience, as posed by Laurenz Wiskott, is the view invariance 
problem. The problem arises from the fact that any real world 3D 
object can produce an infinite set of possible projections on to the 
2D retina. Leaving aside factors such as partial images, occlusions, 



shadows, and lighting variations, the problem comes down to the 
shift, rotation, scale, and skew variations. How then, with no cen- 
tral control could a network like the brain learn, store, classify 
and recognize in real time the multitude of relevant objects in its 
environment? 

Generally most biologically based object recognition solu- 
tions have been based on vertebrate vision, in particular mam- 
malian vision, and have used either statistical methods (Sountsov 
et al., 2011; Gong et al, 2012), signal processing techniques 
(such as log-polar filters) (Cavanagh, 1978; Reitboeck and 
Altmann, 1984), artificial neural networks (i.e., non-spiking neu- 
ral networks) (Nakamura et al., 2002; Norouzi et al., 2009; 
Iftekharuddin, 201 1), and more recently, spiking neural networks 
(Serre et al., 2005; Rasche, 2007; Serrano-Gotarredona et al, 2009; 
Meng etal, 2011). 

Since many of the approaches above are based on mammalian 
vision and strive to achieve the accuracy and resolution of mam- 
malian vision, they are very complex and can only be truly 
implemented on computers (Nakamura et al., 2002; Serre et al., 
2005; Jhuang et al, 2007; Norouzi et al, 2009; Iftekharuddin, 
2011; Meng et al., 2011; Gong et al., 2012), sometimes with very 
slow computation times. Other implementations that have been 
demonstrated on hardware (Rasche, 2007; Serrano-Gotarredona 
et al., 2009; Folowosele et al., 201 1) have been successful in prov- 
ing that vision can be achieved for small, low-power robots, UAVs, 
and remote sensing applications. 

The few models of invertebrate visual recognition have had an 
explanatory focus (Horridge, 2009; Huerta and Nowotny, 2009) 
and, not being developed for the purposes of hardware imple- 
mentability, assume highly connected networks not suitable for 
hardware. 
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CMOS implementations of bio-inspired radial image sensors 
are most closely related to the RPN however when it comes to 
recognition, these sensors ultimately interface with conventional 
processors (Pardo et al., 1998; Traver and Bernardino, 2010) 
rather than spiking neural networks as is the case with the RPN. 

BEGINNING AT THE END: 2D RATE CODING MEMORY vs. 1D TEMPORAL 
CODING MEMORY 

In all the previously listed approaches to the view invariance prob- 
lem the learning and memory system of network responsible for 
recognition has had a 2D structure. This approach is intuitive. 
A memory system that matches the input channel (the retina) 
in dimension makes sense from an engineering perspective. A 
2D signal should interface with a 2D memory system. However, 
from a biological perspective there is no evidence for the exis- 
tence of such a 2D or grid structured memory system in any 
organism. Furthermore and more critically from a computational 




FIGURE 1 | Batesian mimicry: The highly poisonous pufferfish, 
Canthigaster valentine (top) and its edible mimic Paraluteres prionurus 
(bottom). The remarkable degree of precision in the deception reveals the 
sophisticated recognition capabilities of the neural networks of local 
predatory reef fish (Caley and Schluter, 2003). These networks despite 
being orders of magnitude smaller than those of primates seem capable of 
matching human vision in performance and motivate the investigation of 
very simple solutions to the problem of visual recognition. (Note: the 
dorsal, anal and pectoral fins are virtually invisible in the animals' natural 
environment). 



perspective, the standard 2D memory approach necessitates the 
precise alignment of perceived objects to a stored canonical 2D 
template in terms of their scale, position and angle, a computa- 
tionally centralized and biologically implausible operation. The 
mechanisms by which this matching of a 2D image to a 2D tem- 
plate is accomplished makes up a significant portion of the field of 
machine vision. This 2D model of visual memory sits in contrast 
to the more general model of memory as used by computational 
neuroscientists not focused on vision, and in particular those 
specializing in memory systems. 

Among the later group, temporal coding networks have been 
proposed in the last two decades as biologically plausible and 
computationally useful models (Jaeger, 2001; Maass et al., 2002; 
Izhikevich, 2006; Tapson et al, 2013). A temporal coding mem- 
ory network is a type of spiking neural network which uses 
spatio-temporal patterns of spikes to represent information asyn- 
chronously whereas classic artificial neural networks discard 
timing information by modeling neuron firing rates sampled 
synchronously by a central clock. This additional use of asyn- 
chronous temporal information results in greater energy effi- 
ciency and speed (Levy and Baxter, 1996; Van Rullen and Thorpe, 
2001) motivating realization of the model in hardware (Wang, 
2010; Wang et al, 2011; Hussain et al, 2012). 

In this model a neuron can be seen as a memory unit which 
learns and stores via its dendritic weights and delays a particular 
spatio-temporal pattern. 

This temporal coding memory model is a content address- 
able, distributed network comprising of many spiking neurons 
connected to each other via multiple pathways. The network, 
through dynamic adaptation of synaptic weights (Wi, W2, W3 
in Figure 2) and decaying synaptic kernels such as alpha func- 
tions with time constants (T.1T2T3 in Figure 2), makes particular 
neurons exclusively responsible for particular inter-spiking inter- 
vals. It achieves this by continuously adapting its parameters to 
maximize recognition at its output in response to the statistics 
of its input. A longer spatio-temporal pattern can be stored in 
such a network of neurons by the addition of cascading neurons 
and in turn learning these weights and time constants (Paugam- 
Moisy et al., 2008; Ranhel, 2012). Subsequently the network's 
"output" can be measured as the relative activation of certain 
neurons which individually or in concert indicate the recognition 
of a certain pattern. 
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FIGURE 2 I (A) Typical model of a single element in a distributed 
temporal coding memory network with synaptic alpha functions used as 
decaying synaptic kernels producing a decaying memory of recent 
spikes. (B) Biological representation of the same element. Through 



adaptation of synaptic weights and kernels a specific spatio-temporal 
pattern is learnt by the neuron. (C) Flipping the pattern as would 
happen if a 2D image were rotated by 180 degrees results in the 
pattern not being recognized. 
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However just as with rate coding models, temporal coding 
memory systems tasked with recognition also cannot directly 
interface with the retina since they also expect their learnt patterns 
to appear via the same channels every time (see Figure 2). 

Motivated by and working backwards from this time-based 
model of memory we propose the RPN system which instead of 
attempting to align a 2D image to a 2D template, converts the 
2D image to a ID Temporal Pattern (TP) suitable for temporal 
coding memory networks. We then extend the system by using 
multiple RPNs in parallel each sensitive to particular features. 
These parallel RPNs convert the 2D image to an M-dimensional 
spatio-temporal pattern. Thus the proposed RPN system can 
be viewed as a simple neuronal transformation which connects 
the 2D retinotopically mapped inputs to a biologically plausible 
temporal coding memory model. 

MATERIALS AND METHODS 
THE RPN NETWORK 

A central aim of the hardware oriented RPN approach is to obtain 
the most functionality from a minimally connected network. The 
limiting factor of connectivity, though far less significant in biol- 
ogy, frequently constrains hardware implementations and yet is 
not often considered in the development of artificial neural net- 
work algorithms. Approaches which consider such limitations 
at the designs stage facilitate efficient hardware implementation 
(Sivilotti, 1991; Hall et al., 2004; Furber et al, 2012). 



AN UPSTREAM SHIFT INVARIANT SALIENCE NETWORK 

As shown in Figure 3, the RPN receives an image as the spatio- 
temporal, high-pass filtered activation pattern of neurons on 
a conceptual 2D sheet representing the field of attention that 
has been produced by an upstream salience detection system. 
By using a sliding window of attention and focusing it onto a 
single salient object at a time the salience detector effectively 
allows the overall network to operate in a shift invariant man- 
ner. The field of computational and biologically-based salience 
detection is extensive with a wide range of models, techniques, 
and approaches (Itti and Koch, 2001; Vogelstein et al., 2007; 
Gao and Vasconcelos, 2009; Drazen et al, 2011). The proposed 
RPN system does not require any specific salience model having 
a centered input image as its only requirement. For simplic- 
ity, however, we may assume the upstream salience network 
to consist of only a motion detector, which physically fixes 
the creature's gaze onto a moving object. In fact, this sim- 
plified system is not far off the mark in the case of many 
organisms (Dill et al, 1993; Land, 1999) and may serve in 
robotics applications where energy and hardware are also limiting 
factors. 

INPUT IMAGES RIPPLE INWARDS ON THE RPN DISC 

After centering by the salience network, the incoming image stim- 
ulates the neurons distributed on a disc. The disc consists of <J> 
arms and N neurons per arm as shown in Figure 3. 
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FIGURE 3 | The spiral RPN System Diagram: raw image to TR (A) 

Pre-processing stage, from the raw image feature(s) extracted and input to a 
salience network which detects the most salient region and translates it to 
the RPN aperture. (B) The image is then projected on to the RPN disc with 
the RPN taking in one frame at a time via its inhibitory feedback neuron (inh). 
(C) The projected image is then processed via its unidirectional inwardly 



pointing disc arms. The image collapses inward toward the center along the 
arms where it is summed by the Summing Neuron. (D) The output of the 
summing neuron is an integer valued temporal pattern which can be 
processed by memory. For visual clarity the disc illustrated comprises of only 
of thirty arms (4> = 30) and ten neurons per each arm n = 0 . . . 9 with n = 0 
being the central neuron. 
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Functionally these neurons represent simple binary relays with 
fixed unit delays between them. More complex neuron models 
could also be used but this would incur an increased hardware 
cost (Indiveri et al., 201 1). The neurons have one outbound con- 
nection to the next neuron on their arm. This connectivity is 
unidirectional pointing inwards toward the center of the disc and 
each neuron a„-\ v can be activated at time (t) if its upstream 
neuron (on the same spiral arm but further away from the disc 

center), a n v transmits a pulse to its input at (f Af). Thus, 

starting from the disc edge (n = N — 1), the neuronal connec- 
tions radiate inward to the central neurons on the disc (n = 1). 
This inward connectivity creates a rippling effect and since the 
neurons act as relays with a small delay, stimulation of a few neu- 
rons at the edge of the disc produces a small wave of activation 
which travels inward along the disc arms stimulating succeeding 
neurons in turn and ending at the disc center [see equation (1)]. 

For n = 1 . . . N — 1 For <p = 1 . . . 4> 

fl(H-i),<pO) = a«,((.(f - Af) (1) 
where a n is> (f) is the activity on the nth neuron on arm cp at time f. 

A SUMMING NEURON OUTPUTS A TEMPORAL PATTERN 

The inner most ring of neurons (n = 1) as well the single central 
neuron (n = 0) all connect to a common Summing Neuron (red 
sigma in Figure 3) that outputs an integer- valued TP that can be 
represented as an N element vector [Equation (2)]. 

* 

TP sum (f) = a °-f (f ~ Af ) + ai 't ( f ~ 2Af) (2) 
<p=i 

where: 

TPsum(t) is the TP output of the summing neuron, 

<t> is the total number of neurons on the disc, 

<2i,ip (t) is the activity ofthe inner most ring of neurons (n = 1) 
on arm tp at time f. 

flo,<p (t) is the activity of the central neuron (n = 0). 
The summing neuron sums the activity of the disc's central neu- 
rons (n = 1 and n = 0). Where the activity of the RPN neurons 
are digital, as is the case here, the summing neuron outputs an 
integer valued spike at every time step generating the TP that is 
the RPN's output. From the geometry of the disc it is clear that 



this output TP is rotationally invariant. More subtly, as the neu- 
ron distribution on the disc is uniform, the TP resulting from 
a scaled object is a rescaled (in time and in magnitude) version 
of the TP produced from the original object. This dimensional 
collapse of the object's rotational and scale variance into a TP 
greatly simplifies the task of recognition by downstream memory 
systems. 

AN INHIBITORY NEURON ACTS AS AN ASYNCHRONOUS SHUTTER 

Recalling that in addition to receiving inputs from their outer 
neighbor, all neurons are also sensitive to an incoming image. 
Here the neurons function as radially distributed, inwardly con- 
nected pixels on a circular retina such that any pixel/neuron could 
be activated either via its outer neighbor or from its correspond- 
ing sensor. This double activation path means that if there is con- 
tinuous input from the incoming image, say via an asynchronous 
frameless vision sensor or an actual biological retina, the infor- 
mation carried on the disc during the processing phase will be 
corrupted and the generation of unique TPs made impossible. 

In order to prevent the corruption of the RPN's rippling oper- 
ation by new input images some form of shuttering is required. 
One way to control the projection of new image frames onto the 
disc is via a periodic enable signal which enables image projection 
at Af x N time intervals ensuring that the activation due to the 
last frame has cleared the disc shown in Figure 4A. The drawback 
of this approach, however, is that if the projected object size is 
smaller than the disc (which is almost always the case), significant 
time is wasted in processing the empty outer regions of the disc, 
during which time new incoming information could potentially 
be lost. 

A more efficient approach is the use of an asynchronous shut- 
ter. To this end, in addition to being sensitive to an incoming 
image and synapsing onto their inner neighbors along the disc 
arm, all neurons on the disc also connect via a third path to 
an inhibitory neuron (green neuron labeled inh in Figure 4B) 
such that the inhibitory neuron carries information about the net 
activity of all neurons on the disc. In a hardware context this sig- 
nal may simply correspond to the net power consumption of the 
disc. 

* N— 1 

Inh(t)=J2J2 a "^^ (3) 

cp = 1 n = 0 
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FIGURE 4 | Two frame generation approaches: (A) A periodic enable 
signal projects new frames on to the RPN, (B) the inhibitive neuron is 
connected to all neurons on the disc. As the disc activation collapses 
inward along the arms and leaves the disc via the summing neuron, the 
total activation reaching the inhibitive neuron also falls. Once the disc 
activation reaches zero, the path of the input image is unblocked allowing 



the next frame to be projected. As the target object moves away and the 
incident image becomes smaller it takes less time for the activation to 
clear the disc disabling the inhibitory neuron sooner and projecting the 
frame. In this way the RPN frame rate varies dynamically to maximize TP 
generation. Note that in the RPN disc shown, <t> = 8 (arms) and N = 4 
(neurons per arm). 
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FIGURE 5 | (A) Generating uniform global and local neuron density in a 
radially symmetric distribution via an adaptive algorithm that varies the 
angular of new neurons (S n such distance to the nearest neighbor is 
maximized results in a spiral structured disc. The disc is shown with (<t> = 8, 
N = 4). (B) Spiral propagating waves of neural activity on the chicken retina 
due to excitation. Image from (Yu et al., 2012). (C) The spiral structure at 
larger scales RPN disc with (<t> = 8, W= 128). 



where: 

Inh(t) is the output of the inhibitory neuron. 
The output of the inhibitory neuron feeds back inhibitively onto 
the visual pathway carrying the image to the disc. The Inhibitory 
neuron blocks this pathway preventing further transmission of 
the image. In this way the neuron ensures that as long as any 
activity remains on the disc (i.e., while RPN is processing the 
image) no new image will be projected onto it, see Figure 4B. 
The drawback of both these solutions is that their sharp sam- 
pling operations are not biologically plausible. A more nuanced 
solution involves the use of laterally inhibitive pathways to bunch 
neural signals into frame-like wavefronts (Brunei, 2000; Hamilton 
and Tapson, 2011; Afshar et al., 2012; McDonnell et al, 2012) we 
consider this solution in the later discussion section. 

NEURON PLACEMENT ALGORITHM: UNIFORM DISTRIBUTION vs. 
LOG-POLAR/SPACE VARIANT DISTRIBUTIONS 

It may be apparent by inspection that the RPN is invariant to rota- 
tion due to the radial symmetry of the disc in a fashion similar 
to log-polar sensor placement schemes. However, in contrast to 
log-polar and other space variant schemes, the density of neu- 
rons on the RPN disc remains approximately uniform as we move 
away from the center. This symmetric but uniform distribution 
was achieved by placing the nth neuron on each arm at a dis- 
tance r„ = *Jn from the center and spiraling the arms at each step 
n by an offset angle f$„. A search algorithm was used to deter- 
mine p„ for each new ring of neurons. Random angular offsets 
were tried (1000 trials/n) and the minimum distance to previously 
placed neurons calculated. The trial with the largest minimum 
distance was selected for each n. The resulting neuron distribu- 
tions from this randomized algorithm display, highly structured 
spiral forms. 

Such spiral structural symmetry as well as the spreading of 
wave-like activation has been observed in the visual pathways of a 
range of animals from the retina to the higher layers of the visual 
cortex (see Figure 5B) suggesting possible utility in visual pro- 
cessing (Dahlem and Muller, 1997; Huang et al, 2004; Wu et al., 
2008; Dahlem and Hadjikhani, 2009). In the context of artificial 
systems the use of wave-like dynamics for computation and recog- 
nition has only recently begun (Adamatzky et al., 2002; Fernando 
and Sojakka, 2003; Maass, 2007; Izhikevich and Hoppensteadt, 
2009). 

In contrast log-polar approaches to vision have developed over 
several decades (Cavanagh, 1978; Reitboeck and Altmann, 1984). 
Yet these approaches have had the critical flaw of being particu- 
larly sensitive to centering, a problem demonstratively absent in 
biology. The problem with the log-polar solution is it represents a 
local minima in the solution space. Its space variant distribution 
provides a useful automatic scale invariance but critically closes 
the path to extension with respect to translation invariance since 
the non-uniform distribution cements the non-uniform behavior 
of the system in response to a translated image. Furthermore in 
the context of biological plausibility, the central assumption used 
by advocates, that the retino-cortical mapping of the mammalian 
visual system represents a mathematical log-polar transformation 
(Traver and Bernardino, 2010), is subject to controversy in the 
neuroscience community (van Hemmen and Sejnowski, 2005), 



as it fails to explain both off-center recognition or the fact that 
the fovea, which represents the central 2° of the visual field and 
is primarily responsible for object recognition, has a uniform 
retino-cortical mapping (Gattass et al, 2005). 

The use of a uniform distribution on the other hand not only 
represents a more efficient use of available sensor/neuron space 
(a critical factor both in hardware and in biology), and a more 
accurate representation of the part of the visual system actually 
responsible for recognition, but most importantly keeps open the 
path toward a general solution that is invariant to all sources of 
variance including translation. 

TIME WARP INVARIANCE IN MEMORY ENABLING SCALE INVARIANCE 
IN VISION 

One of the important capabilities of temporal coding memory 
systems is the recognition of the same pattern when presented at 
different speeds and magnitudes (Kohonen, 1982; Paugam-Moisy 
et al, 2008; Gutig and Sompolinsky, 2009; Carandini and Heeger, 
2012; Tapson and van Schaik, 2013). Such systems can use tempo- 
ral cues embedded in the input signal or a separate signal carrying 
normalization information to modify the internal parameters of 
their dynamic systems such as the time constants of synaptic ker- 
nels, to slow down or speed up the system to the signal calibrating 
their speed of operation to achieve invariance to signal speed, 
Figures 6A,B. This scheme, often described as shunting inhibition 
(Koch et al., 1983; Volman et al., 2010), is a fundamental ele- 
ment in neuro-computation present in many neural systems and 
responsible for tasks such as enhancement of signal to noise ratio, 
control of signal propagation speeds and control of the dynamic 
range of neural signals (Wills, 2004; Carandini and Heeger, 2012). 

One of the consequences of the uniform distribution of neu- 
rons on the RPN disc is that rescaled input images produce TPs 
which are rescaled temporally and in magnitude as shown in 
Figures 6C,D. This is because relative to a larger image a smaller 
image activates fewer numbers of neurons on the disc and the 
neurons activated are correspondingly closer to the center than 
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FIGURE 6 | Time Warp invariant memory network and the RPN's scale 
invariance: (A) the memory network learns a particular spatiotemporal 
pattern (B) The memory network recognizes a time warped version of 
the learnt pattern (C) The RPN system generates a spatio-temporal 
pattern from the projected image of reef fish via simple color based 



feature extractors. The magnitude m of the initial activation of the inhibitive 
neuron carries total activity due to the image greatly simplifying the 
normalization operation of the memory network. (D) A smaller version of the 
image takes less time to collapse and leave the disc generating a time 
warped version of the original TR 



for a larger image and so the wavefront of activity arrives sooner at 
the summing neuron than for the larger image. Thus the resulting 
TP is a time and magnitude scaled version of the original which 
can be recognized by a time-warp invariant temporal coding 
memory network. 

In this context the RPN approach is particularly useful since 
it provides a ready normalization signal via the initial activation 
level of the inhibitory neuron, m in Figures 6C,D, which corre- 
sponds to the initial activation of every neuron on the RPN discs. 
The rising edge of the inhibitory neuron not only signals the exact 
start time of the TPs but its initial magnitude m representing the 
size of the image can be used by the time-warp memory to achieve 
normalization via an inverse relationship to the memory system's 
time constant (Giitig and Sompolinsky, 2009). Furthermore since 
the inhibitory signal arrives immediately after image projection it 
can be sent directly to memory before the first segments of the TPs 
arrive. 

It is worth noting that in contrast to the uniform distribution 
scheme described, an RPN disc with a log-polar neuron distribu- 
tion would, given an image at two different scales, generate the 
same TP, enabling the system to interface with temporal coding 
memory networks that are not time-warp invariant. This advan- 
tage however, as detailed earlier, is outweighed by the unsuitability 
of the log-polar scheme for extension to a more general transla- 
tion invariant solution. 

MULTIPLE PARALLEL HETEROGENEOUS DISCS RESULT IN HIGHER 
SPECIFICITY 

A drawback of the collapse of a feature rich 2D image into a TP is 
that information can be lost. To counteract this loss of informa- 
tion, the simple RPN system can be extended such that instead of 
using a single disc, the input image can be projected onto multiple 
discs operating in parallel each of which extracts different feature 
maps from the raw image. The simplest features can be extracted 



at the sensor level these include color, motion, and intensity. More 
complex features must be extracted from the spatial properties 
of the simpler feature maps. Discs with heterogeneous connec- 
tivities, densities and dynamics can generate multiple complex 
feature maps such that the incident image can be processed into 
an array of independent TPs the combination of which are unique 
for every object. Some examples include introduction of cross talk 
or coupling between the discs' neurons to effectively produce fil- 
ters of different spatial frequencies, the use of discs with different 
neuron densities and use of hardware implemented gabor filters 
[analogous to orientation sensitive hypercolumns in the visual 
cortex (Bressloff et al, 2002; Dahlem and Chronicle, 2004)] to 
create orientation sensitive feature maps (Choi et al., 2005; Shi 
et al., 2006; Chicca et al., 2007). Below we describe in more detail 
the last two examples and how they may be useful. 

Orientation sensitive features represent a special case for the 
RPN. To function, the RPN and all pre-processing systems pre- 
ceding it must be rotationally invariant, yet orientation sensitive 
feature extractors such as Gabor filters, which are a critical ele- 
ment of any recognition system providing salient cues that in 
combination are unique for different objects and operate on 
Cartesian coordinates. If Gabor filters that use Cartesian coor- 
dinates preceded the RPN, the resulting feature maps would be 
sensitive to rotation as shown in Figure 7A. A simple solution to 
this problem is to first transform the image into polar coordinates 
and then perform Cartesian Gabor filtering. This is the stan- 
dard approach used in log-polar based solutions, however, this 
transformation and the subsequent filtering operations involves a 
central processor which is not biologically plausible. An alterna- 
tive solution is the use of radial Gabor filters which group features 
based on their orientation relative to the disc center as shown in 
Figure 7 and equation (4). 

P = atan2 (y, x) + a, 0 = a + ft (4) 
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FIGURE 7 1 Standard orientation sensitive feature extractors cannot variance of the feature maps eliminates the RPN's rotation 



precede the RPN: (A) Feature extraction via Cartesian Gabor 
filters, G(ot), group features into feature maps based on their 
orientation relative to the Cartesian coordinate system. The 

feature maps are input to the RPN, however, the rotational 



invariance. (B) In contrast, radial Gabor Filters, g(a), group features 
into feature maps based on their orientation relative to polar 
coordinates. These feature maps do not exhibit rotational variance 
and can interface with the RPN. 



where, 

a is the radial orientation of the radial Gabor function, 
P is the angle of the position of the center of the Gabor kernel 
relative to the center of the disc (as shown in Figure 5), 

8 is the orientation of the standard Cartesian Gabor function. 
This approach has the benefit of merging the two steps into 
one, potentially delivering a significant speed advantage while 
avoiding the biologically implausible Cartesian to polar coordi- 
nate transform and, most importantly, leaves open the option 
of extending the RPN in a distributed manner where informa- 
tion about a dynamic center of attention can be used locally to 
generate rotationally invariant, yet information preserving feature 
maps. 

Another potentially useful multi-disc RPN scheme would 
involve the use of discs with different neuron densities along their 
arms, which produce higher speed TPs that can reach the memory 
system more rapidly. These parallel high speed TPs could not only 
provide early information for signal normalization but can also be 
used to narrow the range of possible objects the image could rep- 
resent such that more general categories e.g., bird vs. fish can be 
more rapidly determined than trout vs. carp. Recalling that pat- 
terns are represented in a temporal coding memory network as a 
set of signal propagation pathways, signals from the sparsely pop- 
ulated discs can readily be used to deactivate the vast majority 
of the network's pathways which do not match the early low res- 
olution TP, thus saving most of the energy required to check a 
high resolution TP against every known pattern. This ensures a 
highly energy efficient system which rapidly narrows the number 
of possible object candidates with successively greater certainty. 



As an illustration of such a fan-out feature extraction scheme, 
Figure 8 shows separation of an incident image via parallel, radial 
Gabor filters into multiple feature maps which deliver a higher 
dimension spatio-temporal pattern to the memory network, 
enabling greater selectivity. As examples, radial Gabor filters with 
0, 45, and 90° orientation relative to the center are shown. Also 
illustrated are outputs of discs with N (full), N/2, and N/4 neu- 
rons on each arm, demonstrating the relative temporal order of 
the multi-resolution, spatio-temporal patterns generated. 

The simplicity of such a parallelized, multi-scale system, the 
biological evidence for multi-scale visual receptive fields (Itti 
et al., 1998; Riesenhuber and Poggio, 1999), the presence of mul- 
tispeed pathways in the visual cortex (Loxley et al, 2011) and 
the potential impact on energy consumption, the primary lim- 
iting factor for all biological systems, argues in favor of further 
investigation of such multi-scale, multispeed schemes. 

RESULTS 

To better illustrate the pertinent characteristics of the RPN we 
focus only on the simple one disc case without the added multi- 
disc extensions. Although these extensions can bring the systems 
performance arbitrarily close to ideal, conceptually they are rep- 
etitions of the simple case and merely make the memory system 
more effective by delivering more information in parallel. 

VARIANCE OF RPN OUTPUT DUE TO IMAGE TRANSFORMATIONS 

The RPN is robust to rotation and scale. Figure 9 (left) shows 
the output TPs resulting from a 200 x 200 pixel image and its 
rotated equivalent. The similarity of the resulting 200 point time 
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FIGURE 8 | RPN producing a spatiotemporal pattern using parallel 
radial Gabor filters and discs with varying neuron densities. The 

incident image is processed by a filter bank of radial Gabor filters 
generating multiple feature maps (3 shown) each of these is then projected 
onto three discs with N (full), N/2 (half), and N/4 (quarter) density. The 
lower density discs simply generate an earlier low resolution version of the 
full TP which can be used by the memory for normalization or early 
categorization. The nine TPs shown illustrate the multiplicative effect of 
feature extractors when combined in a fan-out fashion. 



series was calculated via Cosine Similarity (cos9) and Spearman's 
Rank Correlation Coefficient (Spear' p). As expected the similarity 
metrics were high for rotational transformation. Figure 9 (right) 
shows variance due to scale. The generated TPs were normalized 
and resampled to fill the time series vector and compared emulat- 
ing the operation of the time warp invariant memory network. 

To measure the RPN performance as a function of image rota- 
tion scale and shift, a mixed set of 300 different 200 by 200 pixel 
test images consisting of letters, numbers, words, shapes, faces, 
and fish were used in approximately equal numbers samples are 
shown in Figure 10A. All images were high-pass filtered using 
a difference of Gaussians kernel and processed by an RPN disc 
with 200 spiral arms each with 200 neurons. The similarity met- 
rics of Spearman- p and cosine similarity were measured for each 
image across a range of rotation, translation and scale transforms 
with respect to the original image with the mean values shown in 
Figure 10B. Variance as a function of rotation is shown in the left 
panel, where the spiral distribution of the disc's 200 arms resulted 
in a high level of similarity. The pattern shown from 0 to 7t/100 
radians is repeated as expected due to the disc's 200 arms. 
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FIGURE 9 | Temporal patterns (TPs) from the RPN illustrating 
rotational invariance (left) and scale invariance (right). Images were 
projected on to the disc at f = 0 and all time scales were normalized to 200 
by resampling the TPs and aligning them together. Measures of similarity 
between the TPs are given in the form of the Cosine Similarity (cos 8) and 
Spearman's Rank Correlation Coefficient (Spear'p). Both of these measures 
show high degrees of similarity between the images. Images where 
projected onto a disc with <t> = 200 arms and N = 200 neurons per arm. 
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FIGURE 10 | (A) A few of the sample images used to test the RPN's 
variance as a function of rotation, scale and translation. (B) Output TP 
variance as a function of image rotation, the illustrated pattern is repeated 
every ji/100, note the small scale on the y axis. (C) Output TP variance as a 
function of image rescaling (via nearest neighbor resizing algorithm). (D) 
Output TP variance as a function of image translation demonstrating the 
high level of sensitivity. 



Figure 10C shows variance with respect to scaling. The sys- 
tem shows robustness to rescaling down to low scales where the 
nearest neighbor image resizing operation performed to produce 
the downscaled images significantly reduced information content. 
Figure 10D shows variance due to shift or translational trans- 
form. As would be expected for a global polar transform, RPN is 
sensitive to non-centered images where a 10% shift of a 200 x 200 
image (20 pixels) results in a drop of 0.17 and 0.25 on the cosine 
and Spearman similarity metrics respectively. 
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SPEED OF OPERATION 

As a biologically inspired decentralized processor with potential 
use in real-world environments, the RPN's speed of operation is 
an important design consideration. In the context of speed the 
worst case for the RPN would involve two objects which fully span 
the RPN disc and are identical but for a distinguishing feature 
which is located at the disc edge (since activation at the disc edge 
takes the longest time to reach summing neuron), the TP from a 
multi-disc RPN system whose densest discs contained N neurons 
on each arm would be delivered to memory in: 

Trecognize — ^"project ~t~ Af rpn ./V -f- T mem _N^ (5) 

where r rec0 gnize is the total time needed for image recognition, 
^project is the time needed for the image from the retina/sensor to 
be projected onto the RPN disc. In the multi-disc case this term 
would consist of the time required to generate the most time con- 
suming feature maps such as hardware implemented Gabor filters 
resulting in r pio j ecI ~ 240 ns (Chicca et al, 2007). Af rpn is the 
time needed for the activation pattern to propagate one step in 
from the original activation point on the disc along its spiral arm. 
Assuming implementation via a digital relay Af rpn , which is in 
the order of nanoseconds or smaller with N, the number of neu- 
rons per arm on the RPN, often around 500, gives N x At rpn as 
the total amount of time for image data to be processed on the 
disc,. T mem is the shortest time needed by the temporal coding 
memory network to process one point in the input time series 
and can loosely be interpreted as the temporal resolution of the 
memory network as its response times increase linearly with the 
duration of the input signal. Thus given T mem and the length of 
a TP N, and assuming a linear relationship between the length 
of the input signal and time to recognition, the response time of 
the memory network can be estimated. With T mem being on the 
order of 10 ns in current first generation hardware implementa- 
tions (Wang et al., 2013), and with the same high N, we obtain an 
approximate T recogruze on the order of 5 microseconds. 

Since temporal coding memory networks are able to process 
spatio-temporal patterns as they are being generated by RPN the 
Af rpn term is effectively eliminated due to the temporal overlap. 
Which results in: 

Trecognize — T pro j e ct ~r~ T mern N (6) 

Signal processing programs running on sequential von Neumann 
machines require computation times on the order of several mil- 
liseconds just to convert Cartesian images into log-polar images 
while consuming significant computational resources (Chessa 
et al., 2011). Hardware implemented log-polar solutions provide 
significantly higher speed than mapping techniques (Traver and 
Bernardino, 2010), however, unlike the RPN's rippling operation 
which generate processed TPs to memory, the log-polar foveated 
systems operate essentially as simple sensor grids and must inter- 
face with conventional sequential processors, introducing bottle- 
necks that distributed memory systems avoid. Other hardware 
implementations can partially bypass this bottleneck by using 
processor per pixel architecture or convolution networks result- 
ing in very high speeds (Dudek, 2005; Perez-Carrasco et al, 2013) 



that would rival the proposed RPN solution and its extensions in 
speed, however, the drawback with these implementations is their 
lack of full view invariance. 

DISCUSSION 

THE RPN APPROACH CAN BE EXTENDED TO THREE DIMENSIONS 

All the features of the RPN work equally well in three dimensions, 
and can just as easily recognize reconstructed 3D "images." In this 
context the disc is replaced by a sphere with the three dimensional 
image being mapped into the sphere, rippling inwards and being 
integrated at the center. Here the sphere does not necessarily refer 
to the physical shape of the network but to the conceptual struc- 
ture of the connectivity, with a highly connected sphere center and 
radiating connectivity out to an integrating layer of neurons on 
the sphere surface. Given a 3D projection of an object within the 
sphere, skew invariant recognition could also be realized, which is 
among the most difficult challenges in image recognition (Pinto 
et al, 2008). 

The reconstruction of 3D images in artificial systems is a 
well-developed field (Faugeras, 1993). In contrast, the underlying 
mechanisms performing this 3D information representation task 
in humans is still an area of active research (Bulthoff et al., 1995; 
Fang and Grossberg, 2009), where the evidence points to complex 
interactions between multiple mechanisms. 

FRAME BASED VISUAL RECOGNITION IN A BIOLOGICAL CONTEXT 

As detailed earlier the RPN's conversion of 2D (or 3D) data into 
ID TPs requires that the incident image be presented to the RPN 
disc in the form of near simultaneous wave fronts of neuronal 
activity, or frames such that new incoming sensory information 
does not corrupt processing being done on the current image. 
Here will follow explanation of frames. It should be noted that 
a frames are here defined as the coalescing of temporally dis- 
tant information across a multi-channel pathway into narrower 
repeating temporal windows as illustrated in Figure 11. No state- 
ment is made about precise periodicity or precise synchrony as 
"framed spike" output of the neural phase lock system block in 
Figure 1 1 illustrates. 

This frame-based operation of the RPN makes it more use- 
ful from a hardware implementation context, but appears to 
detract from its biological plausibility prompting a search for a 
frameless solution. Yet despite attempts to eliminate the framing 
requirement, to date every proposed and implemented recogni- 
tion system, including those with the express goal of performing 
frameless event-based visual processing, such as those based 
on frameless vision sensors, has had to introduce some vari- 
ant of a frame-based approach when attempting recognition and 
although the approach tends to acquire different names along the 
way, the final result presented to the downstream memory system 
is the convergence of temporally distant information by the par- 
tial slowing or arresting of the leading segments of the incoming 
signal (Zelnik- Manor and Irani, 2001; Lazar and Pnevmatikakis, 
2011; Farabet et al., 2012; Wiesmann et al, 2012; Perez-Carrasco 
et al, 2013). 

However, this failure may speak more to the inherent nature of 
the visual recognition problem than any lack of human ingenu- 
ity. Increasing evidence from neuroscience points to functional 
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FIGURE 11 | A simplified illustration of a decentralized frame producing 
system without a sample and hold operation. (A) A 2D image is 
projected onto the retina. (B) A 1 D slice of the retina is illustrated. (C) The 
sensors on the retina produce spikes in an asynchronous, stochastic 
fashion. Each spike channel represents the path of a single "pixel" from the 
2D retina through the visual system. (D) The asynchronous activations 
travel through a neural phase lock mechanism that bunches temporal 
patterns into frame like wave fronts of activity around Local Field Potentials 
(LFPs) (Martinez, 2005). (E) The resulting image "frames" are projected on 
to the RPN as described earlier. (F) The resulting output TPs are generated 
for recognition by the memory system. Note the unequal inter-frame times 
that would be produced due to the unclocked nature of a biological system. 



synchronicity being present in the visual cortex in the form of 
synchronized gamma waves where one might hypothesize an 
RPN or other recognition system to exist. The function of this 
synchronicity has been attributed to the unification of related 
elements in the visual field, an effect especially pronounced 
with attention (Meador et al, 2002; Van Rullen et al, 2007; 
Buschman and Miller, 2009; Fries, 2009; Gregoriou et al, 2009; 
Dehaene and Changeux, 2011). Furthermore the mechanisms 
proposed to explain such observed waves corresponds to a more 
distributed analog of the RPN's inhibitory neuron (Martinez, 
2005), namely the inhibitory lateral and feedback connections 
that clump related, but spatially distant information into compact 
wavefronts separated by periods of inactivity. This convergence 
from separate fields may be pointing to the usefulness of tempo- 
ral synchrony for visual inputs in the context of recognition (Seth 
et al, 2004). 

THE SHORTCOMINGS OF THE RPN MOTIVATES A MORE GENERAL 
SOLUTION 

A significant drawback of the RPN and the one it shares with 
log-polar and other approaches is the need for precise centering 
of a salient object by an unexplained salience detection system. 
This system not only needs to detect objects of interest but more 
challengingly it must shift the image onto the RPN disc. Within 
the framework of centralized processing systems, the problem 
of shifting an image by an arbitrary value is trivial, however, in 
the context of distributed networks with no central control, the 
task is particularly challenging. A proposed solution is the use of 
dynamic routing systems (Olshausen et al., 1993; Postma et al., 
1997) where a series of route controlling units transport the input 
image to a hypothetical central recognition aperture like that of 
the RPN disc. However, decades of neuroscientific research on 
the visual system has failed to find any evidence for such an aper- 
ture. Furthermore the switching speeds required to operate such 
control systems are far too high to be biologically achievable yet 



humans and animals are manifestly capable of rapid recognition 
of objects that are not centered on their field of view making the 
naive centralized solution unlikely (van Hemmen and Sejnowski, 
2005). This motivates investigation of a distributed solution to 
the salience detection/image centering black box. The RPN unlike 
previous approaches using 2D memory can easily be extended 
from a global image-to-TP transform to a localized operator that 
converts local images to local TPs such that the RPN disc can 
be constructed dynamically anywhere in the field of view from 
the gradient of the salience map enabling rapid, view invariant, 
multi-object recognition. 

CONCLUSION 

In this paper we have introduced the RPN system, a simple bio- 
logically inspired view invariant transformation that is hardware 
implementable, and capable of converting 2D images to spatio- 
temporal patterns suitable for recognition by temporal coding 
memory networks. We described a few of the ways in which RPN 
can be utilized, its relationship to biological systems as well as 
detailing its shortcomings. With these as motivation we outlined 
the requirements that a more general solution would need to 
meet in order to be biologically plausible and useful in real world 
environments. 
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